The  Promise  Of  The  WAIS  Protocol 

Emerging  Standard  Represents  First  Step  Toward  Unifying  Data  Search  &  Retrieval 


BY  JASON  LEVITT 

It  doesn't  take  an  expert  to  see  that  the  state  of 
modern  information  handling  is  neither  open  nor 
unified.  A  trip  to  the  main  library  at  the  Universi- 
ty of  Texas  at  Austin — one  of  the  top  10  college 
library  systems  in  the  U.S. — confirms  this. 

The  primary  card  catalog  is  contained  on  an 
IBM  mainframe  accessible  through  various  syn- 
chronous block-mode  terminals  scattered  about 
the  main  library,  and  also  accessible  via  modem 
through  a  rather  crude  dial-up  facility. 

In  the  main  reference  room,  an  OCLC  (On-line 
Computer  Library  Center)  terminal  allows  access 
to  other  university  card  catalogs;  several  IBM  PCs 
are  available  to  search  CD-ROMs  for  bibliograph- 
ical citations  and  abstracts  on  a  variety  of  subjects; 
and  a  LEXIS/NEXUS  terminal  can  be  used  for 
researching  major  U.S.  court  decisions.  In  the 
engineering  library,  an  IBM  PC  with  CD-ROM  is 
available  for  searching  U.S.  patents. 


WAIS  Server  Source  File 


(  :source 
rversion  3 

:ip-name  "nextbox.utoday.com" 
:tcp-port  5001 
:database-name  "UTTECH" 
:cost  0.00 
:cost-unit  rfree 

:maintainer  "jason@nextbox.utoday.com" 
:description  "Server  created  with  WAIS  release 

8  b2  on  Mon  Nov  18  16:54:19  1991  by 

jason@nexrbox.utoday.com 

UNIX  Today!  technology  articles  by  Jason  Levitt 

The  files  of  type  text  used  in  the  index  were: 
/LocalLibrary/WAIS/arHcles/ ABCstory.txt 
/LocalLibrary/WAIS/arricles/AIX3. 1  FS.txt 
/LocalLibrary/WAIS/articles/Benchinfo.txt 
/LocalLibrary/WAIS/articles/LPFstory.txt 
/LocalLibrary/WAIS/articles/MacXstory.txt 
/LocalLibrary/WAIS/articles/Solbourne.rxt 
/  LocalLibrary/WAIS/articles/SunStory.txt 
/LocalLibrary/WAIS/  articles/XSerialArticle.txt 
/LocalLibrary/WAlS/arricles/Xarricle.txt 
/LocalLibrary/WAIS/arricles/Xcontrib.rxt 


Figure  1 


If  one  were  to  compare  information  accessibility 
at  this  facility  to  computer  resource  accessibility, 
things  here  are  still  in  the  early  1980s  or  late  '70s. 
Each  of  the  systems  mentioned  are  primarily  stan- 
dalone and  proprietary,  having  their  own  informa- 
tion retrieval  and  organizational  formats  with  little, 
if  any,  interoperability  between  databases. 

While  the  monster  mainframe  card  catalog 
might  provide  pointers  to  many  sources,  it  is 
ignorant  of  most  other  on-line  sources  and  almost 
never  provides  the  most  current  information  on 
subjects,  despite  the  best  efforts  of  its! administra- 
tors. What  these  information-handling  systems 
need  is  a  dose  of  open  systems  standards  and 
technology,  the  same  technology  that  is  changing 
the  face  of  modern  computing. 

Enter  WAIS,  for  Wide-Area  Information  Serv- 
er, a  fledgling  step  in  the  overwhelming  effort 
needed  to  unify  information  search  and  retrieval 
technology.  WAIS  is  an  emerging  open  systems 
standard  protocol  for  query  and  retrieval  of  infor- 
mation. WAIS,  pronounced  "ways,"  is  the  brain- 
child of  Brewster  Kahle,  an  employee  of  Thinking 
Machines  Corp.  (TMC),  the  No.  2  supercomputer 
manufacturer,  behind  Cray,  and  purveyor  of  fine, 
massively  parallel  systems. 

The  basis  for  WAIS  is  the  rapidly  growing 
electronic-publishing  movement,  which  is  seeing 
more  and  more  materials,  usually  available  only 
in  book  form,  "published"  or  placed  onto  elec- 
tronic media  such  as  disk  and  tape,  where  it  can 
be  accessed  with  a  computer. 


WAIS  TECHNOLOGY 

WAIS  is  a  protocol  for  the  transmission  of  que- 
ry and  retrieval  information,  much  like  the  infor- 
mation you  would  use  to  search  a  library  card 
catalog.  It  is,  in  fact,  an  extension  to  an  existing 
protocol  standard  called  Z39.50,  the  Information 
Retrieval  Service  Definitions  and  Protocol  Specifi- 
cation for  Library  Applications. 

The  Z39.50  standard  was  created  by  a  group 
called  NISO,  the  National  Information  Standards 
Organization,  and  is  designed  for  use  in  electron- 
ic library  card  catalogs.  Z39.50  essentially  speci- 
fies formats  for  search  requests  directed  at  a  data- 
base and  formats  for  document  retrieval  requests. 
WAIS  extends  the  Z39.50  standard  to  allow, 
among  other  things,  discrete  portions  of  docu- 
ments, called  "chunks,"  to  be  retrieved.  This  is 
especially  useful  in  low-bandwidth  situations 
such  as  serial  links,  where  transferring  an  entire 
document  in  response  to  a  query  would  be  pro- 
hibitively time-consuming. 
The  WAIS  pro-   L 


tocol  fits  neatly  at 
the  top  of  the  ISO 
7-layer  protocol 
model  at  the  ap- 
plication and  pre- 
sentation layers. 
This  makes  it  ex- 
tremely portable  to  differing  network  environ- 
ments such  as  TCP/IP  and  X.25. 

Like  any  good  open  standard,  the  WAIS  proto- 
col does  not  specify  or  limit  the  technology  at 
either  end  of  the  wire.  A  WAIS  client  can  be  as 
simple  as  a  command  line  interface  that  takes  a 
database  name,  network  address  and  query 
string  as  input,  or  as  complex  as  a  combination 
spreadsheet  and  database  that  constantly  updates 
in  real  time,  based  on  client/server  activity  taking 
place  in  the  background.  The  only  condition  is 
that  the  client  and  server  exchange  query  and 
retrieval  information  using  the  WAIS  protocol. 

The  free  WAIS  source  code,  discussed  later, 
implements  a  very  typical  client/server  model  for 
Unix-based  Internet  applications.  The  server  cre- 
ates and  waits  on  a  socket  attached  to  a  well- 
known  port.  Clients  attach  to  the  port  using  the 
port  number  and  network  address  of  the ,  ma- 
chine. The  server  accepts  a  request,  forks  a  child 
process  to  handle  the  request,  and  then  continues 
to  wait  and  service  other  requests. 

Requests  for  information  are  largely  governed 
by  special  text  files  maintained  by  the  WAIS  serv- 
er, called  "sources,"  that  vaguely  resemble  li- 
brary catalog  cards.  Figure  1  shows  a  source  I 
created  containing  10  of  my  previous  technology 
articles  for  UNIX  Today!  There  is  enough  informa- 
tion in  the  source  structure,  network  address, 
TCP  port  number  and  database  name  for  any 
other  machine  on  the  network  running  a  WAIS 
client  to  locate,  understand  and,access  the  infor- 
mation in  the  database. 
Not  surprisingly,  WAIS  is  already  being  used 


On-Line  WAIS  Discussions  And  Development 

alt.wais  newsgroup  on  USENET 
Join  mailing  lists  by  sending  e-mail  to: 

wais-discussion-request@think.com  -  Weekly  digest  of  mail  from  users  and 

developers 

wais-interest-request@think.com  -  Infrequent  announcements  of  new  releases 
wais-talk-request@think.com  -  Developers'  mailing  list 

Free  WAIS  client  software 

Clients  for  NeXT,  X,  Macintosh,  Unix  ASCII,  GNU  Emacs  and  Motif. 
Anonymous  FTP  to  think.com  in  the  directory  /wais 

Clients  for  VMS,  MS-DOS,  Novell  LAN  Workplace  and  SunView. 
Anonymous  FTP  to  samba.oit.unc.edu  in  the  directory  /pub/wais/UNC 

Free  WAIS  server  software 

Servers  for  NeXT  and  various  Unix:  platforms 
Anonymous  FTP  to  think.com  in  the  directory  /wais 


to  connect  archive  sites  on  the  Internet  running 
on  various  Unix-based  machines  as  well  as  pro- 
prietary systems  such  as  Macintosh  and  NeXT. 
According  to  Brewster  Kahle,  there  are  approxi- 
mately 80  sites  running  public  WAIS  servers  and 
many  more  running  WAIS  privately  within  cor- 
porations and  academia.  A  FidoNet  WAIS  server 
site  was  recently  added  to  this  collection  of  public 
sites  running  SLIP  over  a  9,600-bps  serial  link. 

FREE  WAIS  SOFTWARE 

I  like  software  that  you  can  use  to  get  some 
meaningful  work  done  quickly  without  having  to 
dig  too  deeply  into  documentation.  The  freely 
available  WAIS  software  fits  that  description.  In 

General  WAIS  Information 

Thinking  Machines  Corp. 

;  1010  El  Camino  Real,  Ste.  310 
Menlo  Park,  CA  94025 
415-329-9300  Fax:415-329-9329 

Bibliography  of  available  WAIS  documents. 
Send  electronic  mail  to:  barbara@think.com 

Accessing  a  WAIS  client  on  the  Internet 

Telnet  to  quake.think.com,  login  as  wais 
Getting  involved  with  the  Nafl  Public  Network 

Electronic  Frontier  Foundation 

1 55  Second  Street 
Cambridge,  MA  02141 
617-864-0665 
E-mail:  eff@eff.org 


the  UNIX  Today!  labs,  I  decided  to  put  together  a 
small  heterogeneous  network  and  run  WAIS. 

Acting  as  the  WAIS  server  system  (and  also  a 
client)  was  a  NeXTstation.  Attached  over  Ether- 
net was  a  Macintosh  running  MacOS  and  a  Sun 
3/60  running  SunOS  4.1.  The  free  WAIS  software 
included  NeXT  and  Mac  binaries  and  complete 
source  code  for  the  Unix  systems,  in  this  case  the 
Sun.  I  dug  out  my  archives  of  personal  Unix 
electronic  mail,  about  10  Mbytes'  worth,  and  used 
the  indexing  program  included  with  the  WAIS 
server  to  create  a  hashed  database.  I  did  the  same 
with  10  of  my  old  technology  articles  written  for 
UNIX  Today!  The  databases,  or  "sources,"  are 
listed  in  Figure  2. 

The  WAIS  indexing  program  knows  about  the 
format  of  many  common  types  of  structured  on- 
line data  such  as  electronic  mail,  netnews,  PICT- 
/GIF/TIFF  files  and  biology  abstract  formats,  and 
it  also  handles  straight  ASCII  text. 

There  was  also  a  database  of  WAIS  documenta- 
tion, created  automatically  by  the  server  pro- 
gram, and  a  directory  of  all  sources  I  created 
called  "directory  of  information"  that  simply 
points  to  all  the  databases.  After  creating  the 
databases,  I  ran  the  WAIS  server  program,  called 
waisserver,  on  the  NeXTstation,  which  sits  and 
waits  for  incoming  WAIS  client 
requests. 

Once  the  waisserver  was  run- 
ning, I  could  access  it  using  the 
clients,  called  WAISstations.  On 
the  Sun,  which  was  running 
X/Motif ,  I  chose  to  use  the  Motif 
client.  I  also  used  the  Mac  and 
NeXT  WAISstations.  In  order  to 
access  a  waisserver,  I  first  had  to 
set  up  my  sources.  Figure  3 
shows  a  source  setup  window 
for  the  Mac  client.  I  had  named 
my  database  of  articles  "UT- 
TECH" on  the  WAIS  server. 
The  access  method,  "Contact," 
was  MacTCP,  Apple's  TCP/IP 
Continued  on  page  47 
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rnis  possiDle.  The  target  dates  tor 
various  levels  of  system  manage- 
ment range  from  1991  to  1993. 

Time  Services.  There  are  two 
time  services  in  Atlas;  one  that  is 
essentially  OSF's  Distributed  Time 
Service,  and  an  optional  "Class  1" 
service  that  attempts  to  synchro- 
nize as  closely  as  possible  to  a  cen- 
tral UTC  (Universal  Coordinated 
Time)  time  standard.  Time  service 
is  due  in  1992. 


file  system  or  network  failures  as  they 
relate  to  file  operations. 

The  "virtual  file  system"  facility 
introduced  in  SVR4  allows  new 
types  of  file  systems  to  be  added  or 
"stacked,"  similar  to  protocol 
stacks.  File  system  stacks  permit 
new  functions — such  as  the  caching 
mentioned  above — to  be  added 
transparently  to  applications  that 
access  those  files.  The  only  bad 
news  here  is  that  most  of  this  isn't 


whatsoever.  As  with  many  of  its  oth- 
er services,  Atlas  security  is  OSI-  and 
X/Open-compliant.  It  uses  a  model  of 
"security  domains,"  which  can  be  de- 
fined as  a  set  of  objects.  Each  oper- 
ation on  a  security  domain  can  re- 
quire authorization,  if  necessary,  and 
all  security  mechanisms  can  be  imple- 
mented separately. 

Authentication  service,  which  de- 
termines the  identity  of  an  object,  is 
Continued  en  page  52 


Focus  On  WAIS 

Centmed  from  pga  44 
protocol  stack. 

As  shown  in  Figure  2,  I  decided  to  search  my  mail 
archives  and  technology  articles  for  references  to  NCD's 
Xremote  protocol.  The  results  appear  in  the  scrolling  list. 
If  the  result  is  an  entire  file,  such  as  the  article  contained 
in  the  file  "XSerialArticle.txt,"  the  path  name  for  the  file 
is  listed  after  it. 

The  other  results  in  the  list  are  individual  E-mail 
messages  that  actually  are  in  several  large  text  files  on 
the  WAIS  server.  Because  the  WAIS  indexing  program 
understands  E-mail  format,  it  was  able  to  index  individ- 
ual E-mail  messages  in  my  E-mail  archive  files  and 
transfer  only  those  E-mail  messages  pertinent  to  the 
client  query. 

By  clicking  on  a  document  in  the  Results  window, 
the  portion  of  the  result  most  relevant  to  my  query 
appears  in  another  window.  The  waisserver  uses  a 
simplistic  approach  to  interpreting  my  request  for 
information  about  Xremote.  It  looks  for  the  word 
"xremote" — the  search  is  case-insensitive — in  mail 
messages  and  headers  and  displays  matching  docu- 
ments and  mail  messages  in  the  results  window. 
This  turns  out  to  be  adequate  as  long  as  you  put 
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Figure  2:  Mac  WAIS  client  shown  with  results  of  a  search  for  "xremote" 


Figure  3:  WAIS  dent  set  up  to  use  Mac  TCP/IP  and  the  database  UT — TECH 

meaningful  words  in  your  query. 

TMC  has  a  much  more  sophisticated  searching 
mechanism  in  its  Internet  server,  quake.think.com;  how- 
ever, the  search  source  code  is  not  freely  available. 

One  of  the  key  features  of  the  WAIS  protocol  is  its 
ability  to  allow  secondary  search  criteria.  In  Figure  2, 
the  criteria  would  be  entered  by  copying  a  result,  or 
chunk  of  a  result,  to  the  "which  are  similar  to"  win- 
dow. A  subsequent  search  would  use  any  words  con- 
tained in  that  window  as  additional  search  criteria. 
Repeatedly  using  that  method  can  quickly  refine  the 
search  parameters. 

AN  OPEN  END 

The  next  version  of  the  WAIS  protocol  should  be  officially 
folded  into  the  Z39.50  standard  this  month  and  is  expected 
to  include  multimedia  support  and  integral  support  for 
English-language  queries.  These  enhancements  should  add 
considerable  clout  to  WAIS,  given  the  infant  state  of  com- 
mercial multimedia  query/retrieval  technology. 

WAIS  software  is  freely  available  from  a  number  of 
sites.  Unfortunately,  the  WAIS  client  program  can 
only  be  obtained  via  anonymous  FTP  at  this  time, 
which  means  you  have  to  have  direct  Internet  access. 

The  WAIS  server  and  X-based  client  program  for 
Unix  are  available  on  uunet.uu.net  in  the  directory 
I  networking!  distrib-ishvais. 

My  small  network  experiment  with  WAIS  only 
touched  on  its  full  potential;  however,  for  my  small 
database  needs,  it  was  quite  useful.  The  free  WAIS 
software  is,  like  the  MIT  X  software,  meant  as  refer- 

Continued  on  page  48 
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